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Abstract: We study optimal user-network association in an integrated 802.11 WLAN and 
3G-UMTS hybrid cell. Assuming saturated resource allocation on the downlink of WLAN 
and UMTS networks and a single QoS class of mobiles arriving at an average location in the 
hybrid cell, we formulate the problem with two different approaches: Global and Individ- 
ual optimality. The Globally optimal association is formulated as an SMDP (Semi Markov 
Decision Process) connection routing decision problem where rewards comprise a financial 
gain component and an aggregate network throughput component. The corresponding Dy- 
namic Programming equations are solved using Value Iteration method and a stationary 
optimal policy with neither convex nor concave type switching curve structure is obtained. 
Threshold type and symmetric switching curves are observed for the analogous homogenous 
network cases. The Individual optimality is studied under a non-cooperative dynamic game 
framework with expected service time of a mobile as the decision cost criteria. It is shown 
that individual optimality in a WLAN-UMTS hybrid cell, results in a threshold policy curve 
of descending staircase form with increasing Poisson arrival rate of mobiles. 
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Association d'Utilisateur-Reseau dans une Cellule 
Hybride de WLAN-UMTS : Optimalite Globale & 

Individuelle 

Resume : Nous etudions l'association optimale utilisateur-reseau dans une cellule hybride 
802.11 WLAN et 3G-UMTS. En supposant que l'attribution de ressource sature le lien 
descendant des reseaux WLAN et UMTS et que les mobiles se situent tous a une meme 
position moyenne dans la cellule hybride et appartiennent a la meme classe de qualite de 
service, nous formulons le probleme selon deux approches differentes : optimalite globale 
et individuelle. L'association globalement optimale est formulee comme un probleme de 
decision de routage SMDP (Semi Markov Decision Process) dans lequel les recompenses 
comportent une composante financiere de gain et une composante de debit global de reseau. 
Les equations de programmation dynamique correspondantes sont resolues en utilisant la 
methode d'iteration des valeurs et la politique optimale stationnaire est alors obtenue avec 
une courbe de commutation ni convexe ni concave. Nous constatons que pour les cas 
analogues de reseau homogenes, les courbes de commutation sont symetriques et de type 
seuil. L'optimalite individuelle est etudiee dans un cadre de jeu dynamique non-cooperatif 
en considerant le temps de service moyen d'un mobile comme critere de cout pour la decision. 
Nous montrons dans le cas de l'optimalite individuelle dans une cellule hybride de WLAN- 
UMTS, que la courbe de politique de seuil est une fonction decroissante par palier du taux 
d'arrivee de Poisson des mobiles. 

Mots-cles : hybride, heterogene, WLAN, UMTS, MDP, optimisation, commande, jeu 
non-cooperatif 
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1 Introduction 

As 802.11 WLANs and 3G-UMTS cellular coverage networks are being widely deployed, 
network operators are seeking to offer seamless and ubiquitous connectivity for high-speed 
wireless broadband services, through integrated WLAN and UMTS hybrid networks. For 
efficient performance of such an hybrid network, one of the core decision problems that a 
network operator is faced with is that of optimal user-network association, or load balanc- 
ing by optimally routing an arriving mobile user's connection to one of the two constituent 
networks. We study this decision problem under a simplifying assumption of saturated 
downlink resource allocation in the lone WLAN and UMTS cells. To be more specific, con- 
sider a hybrid network comprising two independent 802.11 WLAN and 3G-UMTS networks, 
that offers connectivity to mobile users arriving in the combined coverage area of these two 
networks. By independent we mean that transmission activity in one network does not cre- 
ate interference in the other. Our goal in this paper is to study the dynamics of optimal 
user-network association in such a WLAN-UMTS hybrid network. We provide two different 
and alternate modeling approaches that differ according to who takes the association or 
connection decision and what his/her objectives are. In particular, we study two different 
dynamic models and the choice of each model depends on whether the optimal objective 
criteria can be represented as a global utility such as the aggregate network throughput, 
or an individual cost such as the service time of a mobile user. We concentrate only on 
streaming and interactive data transfers. Moreover, we consider only a single QoS class of 
mobiles arriving at an average location in the hybrid network and these mobiles have to be 
admitted to one of the two WLAN or UMTS networks. Note that we do not propose a full 
fledged cell-load or interference based connection admission control (CAC) policy in this 
paper. We instead assume that a CAC precedes the association decision control. A connec- 
tion admission decision is taken by the CAC controller before any mobile is considered as a 
candidate to connect to either of the WLAN or UMTS networks. Thereafter, an association 
decision only ensures global or individual optimal performance and it is not proposed as an 
alternative to the CAC decision. However, the association decision controller can still reject 
mobiles for optimal performance of the network. 

In our model, we introduce certain simplifying assumptions, as compared to a real life 
scenario, in order to gain an analytical insight into the dynamics of user-network association. 
Without these assumptions it may be very hard to study these dynamics in a WLAN-UMTS 
hybrid network. 

1.1 Related Work and Contributions 

Study of WLAN-UMTS hybrid networks is an emerging area of research and not much 
related work is available. Authors in some related papers ( p] El El EI El El 0) hare studied 
issues such as vertical handover and coupling schemes, integrated architecture layout, radio 
resource management (RRM) and mobility management. However, questions related to load 
balancing or optimal user-network association have not been explored much. Premkumar 
et al. in [8] propose a near optimal solution for a hybrid network, within a combinatorial 
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optimization framework which is different from our approach. To the best of our knowledge, 
ours is the first attempt to explicitly compute globally optimal user-network association 
policies for a WLAN-UMTS hybrid network, under an SMDP decision control formulation. 
Moreover, this work is the first we know of to use stochastic non-cooperative game theory 
to predict user behavior in a decentralized decision making situation. 

2 Model Framework 

A hybrid network may be composed of several 802.11 WLAN Access Points (APs) and 3G- 
UMTS Base Stations (NodeBs) that are operated by a single network operator. However, 
our focus is only on a single pair of an AP and a NodeB that are located sufficiently close 
to each other so that mobile users arriving in the combined coverage area of this AP-NodeB 
pair, have a choice to connect to either of the two networks. We call the combined coverage 
area network of a single AP cell and a single NodeB micro-cell as a hybrid cell. The cell 
coverage radius of a UMTS micro-cell is usually around 400to to 1000m whereas that of a 
WLAN cell varies from a few tens to a few hundreds of meters. Therefore some mobiles 
arriving in the hybrid cell may only be able to connect to the NodeB either because they 
fall outside the transmission range of the AP or they are equipped with only 3G technology 
electronics. While, other mobiles that are equipped with only 802.11 technology can connect 
exclusively to the WLAN AP. Apart from these two categories, mobiles equipped with both 
802.11 WLAN and 3G-UMTS technologies can connect to any one of the two networks. 
The decision to connect to either of the two networks can involve different cost or utility 
criteria. A cost criteria could be the average service time of a mobile and an example utility 
could comprise the throughput of a mobile. Moreover, the connection or association decision 
involves two different decision makers, the mobile user and the network operator. Leaving the 
decision choice with the mobile user may result in less efficient use of the network resources, 
but may be much more scalable and easier to implement. We thus model the decision 
problem in two different and alternate ways. Firstly, we consider the Global Optimality 
dynamic control formulation in which the network operator dictates the decision of mobile 
users to connect to one of the two networks, so as to optimize a certain global cell utility. 
And secondly, we consider the Individual Optimality dynamic control formulation in which 
a mobile user takes a selfish decision to connect to either of the two networks so that only 
its own cost is optimized. We model the Global optimality problem with an SMDP (Semi 
Markov Decision Process) control approach and the Individual optimality problem under a 
non-cooperative dynamic game framework. Before discussing further the two approaches, 
we first describe below a general framework common to both. We also state some simplifying 
assumptions and expressions for the downlink throughput from previous work. Since the 
bulk of data transfer for a mobile engaged in streaming or interactive data transmission is 
carried over the downlink (AP to mobile or NodeB to mobile), we are interested here in the 
TCP throughput of only downlink. 
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2.1 Mobile Arrivals 

We model the hybrid cell of an 802.11 WLAN AP and a 3G-UMTS NodeB as an M/G/2 
processing server system (Figures & EJl with each server having a separate finite pole 
capacity of Map and M^q mobiles, respectively. We will give further clarifications on the 
pole capacity of each server later in Sections 12.31 and 12.41 As discussed previously, mobiles 
are considered as candidates to connect to the hybrid cell only after being admitted by a 
CAC, such as the one described in jH]. Some of the admitted mobiles can connect only to 
the WLAN AP and some others only to the 3G-UMTS NodeB. These two set of arriving 
mobiles are each assumed to constitute two separate dedicated arrival streams with Poisson 
rates Xap and A3G, respectively. The remaining set of mobiles which can connect to both 
networks form a common arrival stream with Poisson rate Xapzg ■ The mobiles of the two 
dedicated streams can either directly join their respective AP or NodeB network without 
any connection decision choice involved, or they can be rejected. For mobiles of common 
stream, either a rejection or a connection routing decision has to be taken, as to which of 
the two networks will the arriving mobiles join, while optimizing a certain cost or utility. 
It is assumed that all arriving mobiles have a downlink data service requirement which is 
exponentially distributed with parameter (. In other words, every arriving mobile seeks 
to download a data file of average size 1/Q bits on the downlink. Let 8ap(iti c ) denote the 
downlink throughput of each mobile in the AP network when m c mobiles are connected to it 
at any given instant. If r\p>h denotes the downlink cell load of the NodeB cell, then assuming 
N active mobiles to be connected to the NodeB, r\ = denotes the average load per user 
in the cell. Let 6sg{v) denote the downlink throughput of each mobile in the NodeB network 
when its average load per user is r\. With the above notations, the effective service rates of 
each network or server can be denoted by /ZAp(fi c ) = C x ^Ap(m c ) and fi^civ) — C x @3g{v)- 

2.2 Simplifying Assumptions 

We assume a single QoS class of arriving mobiles so that each mobile has an identical 
minimum downlink throughput requirement o{9 m i n , i.e., each arriving mobile must achieve a 
downlink throughput of at least 9 m i n bps on either of the two networks. It is further assumed 
that each mobile's or receiver's advertised window W* is set to 1 in the TCP protocol. This 
is known to provide the best performance of TCP (see ^H] , EH and references therein) . 

We further assume saturated resource allocation in the downlink of AP and NodeB net- 
works. Specifically, this assumption for the AP network means the following. Assume that 
the AP is saturated and has infinitely many packets backlogged in its transmission buffer. In 
other words, there is always a packet in the AP's transmission buffer waiting to be transmit- 
ted to each of the connected mobiles. Now in a WLAN cell, resource allocation to an AP on 
the downlink is carried out through the contention based DCF (Distributed Coordination 
Function) protocol. If the AP is saturated for a particular mobile's connection and W* is 
set to 1, then this particular mobile can benefit from higher number of transmission op- 
portunities ( TxOPs) won by the AP for downlink transmission to this mobile (hence higher 
downlink throughput), than if the AP is not saturated or W* is not set to 1. Thus with 
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the above assumptions, mobiles can be allocated downlink throughputs greater than their 
QoS requirements of S„ and cell resources in terms of TxOPs on the downlink will be 
maximally utilized. 

For the NodeB network, the saturated resource allocation assumption has the following 
elaboration. It is assumed that at any given instant, the NodeB cell resources on downlink 
are fully utilized resulting in a constant maximum cell load of rfj^ . This is analogous to the 
maximal utilization of TxOPs in the AP network discussed in the previous paragraph. With 
this maximum cell load assumption even if a mobile has a minimum throughput requirement 
of only 9 m in bps, it can actually be allocated a higher throughput if additional unutilized cell 
resources are available, so that the cell load is always at its maximum of r)^^ x . If say a new 
mobile j arrives and if it is possible to accommodate its connection while maintaining the QoS 
requirements of the presently connected mobiles (this will be decided by the CAC), then the 
NodeB will initiate a renegotiation of QoS attributes (or bearer attributes) procedure with 
all the presently connected mobiles. All presently connected mobiles will then be allocated 
a lower throughput than the one prior to the set-up of mobile j's connection. However, 
this new lower throughput will still be higher than each mobile's QoS requirement. This 
kind of a renegotiation of QoS attributes is indeed possible in UMTS and it is one of its 
special features (see Chapter 7 in ^2]). Also note a very key point here that the average 
load per user r\ as defined previously in Section I2TT1 decreases with increasing number of 
mobiles connected to the NodeB. Though the total cell load is always at its maximum of 
Vdl j contribution to this total load from a single mobile (i.e., load per user, rj) decreases as 
more mobiles connect to the NodeB cell. We define A(rj) as the average change in r/ caused 
by a new mobile that connects to the NodeB cell. Therefore, when a new mobile connects, 
the load per user drops from 77 to rj — A(i]) and when a mobile disconnects, the load per 
user increases from r\ to r\ + A (rj). 

In downlink, the inter-cell to intra-cell interference ratio denoted by ij and the orthogo- 
nality factor denoted by <x, are different for each mobile j depending on its location in the 
NodeB cell. Moreover, the throughput achieved by each mobile is interference limited and 
depends on the signal to interference plus noise ratio (SINR) received at that mobile. Thus, 
in the absence of any power control, the throughput also depends on the location of mobile 
in the NodeB cell. We assume a uniform SINR scenario where closed-loop fast power control 
is applied in the NodeB cell, so that each mobile receives approximately the same SINR. 
We therefore assume that all mobiles in the NodeB cell are allocated equal throughputs. 
This kind of a power control will allocate more power to users far away from the NodeB 
that are subject to higher path-loss, fading and neighboring cell interference. Users closer to 
the NodeB will be allocated relatively less power since they are susceptible to weaker signal 
attenuation. In fact, such a fair throughput allocation can also be achieved by adopting a 
fair and power-efficient channel dependent scheduling scheme as described in Now since 
all mobiles are allocated equal throughputs, it can be said that mobiles arrive at an average 
location in the NodeB cell (see Section 8.2.2.2 in ^2]). Therefore all mobiles are assumed to 
have an identical average inter-cell to intra-cell interference ratio i and an identical average 
orthogonality factor a. 
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Figure 1: Total throughput of all mobiles in an AP cell 



The assumption on saturated resource allocation is a standard assumption, usually 
adopted to simplify modeling of complex network frameworks like those of WLAN and 
UMTS (see for e.g., [121 114| ). Mobiles in NodeB cell are assumed to be allocated equal 
throughputs in order to have a comparable scenario to that of an AP cell, in which mobiles 
are also known to achieve fair and equal throughput allocation (see Section l2~3l) . Moreover 
such fair throughput allocation is known to result in a better delay performance for typical 
file transfers in UMTS (see [121). Furthermore, the assumption of mobiles arriving at an 
average location in the NodeB cell, is essential in order to simplify our models in Sections 
OH and For instance, in the global optimality model, without this assumption the hybrid 
network system state will have to include the location of each mobile. This will result in a 
higher dimensional SMDP problem which is analytically intractable. 

2.3 Downlink Throughput in 802.11 WLAN AP 

We reuse the downlink TCP throughput formula for a mobile in a WLAN from [J2]. For 
completeness, here we briefly mention the network model that has been extensively studied 
in an d then simply restate the throughput expression without going into much details. 
Each mobile connected to the AP uses the Distributed Coordination Function (DCF) proto- 
col with an RTS/CTS frame exchange before any data-ack frame exchange and each mobile 
has an equal probability of the channel being allocated to it. With the assumption of W* 
being set to 1 fSection l2,2JI any mobile will always have a TCP ack waiting to be sent back to 
the AP with probability 1/2, which is also the probability that it contends for the channel. 
This is however true only for those versions of TCP that do not use delayed acks. If the AP 
is always saturated or backlogged, the average number of backlogged mobiles contending for 
the channel is given by mj = 1 + ,J ^. Based on this assumption and since for any connection 



RR n° 0123456789 



8 



Dinesh Kumar, Eitan Altman & Jean-Marc Kelif 




Figure 2: Total throughput of all mobiles in NodeB cell 



an ack is sent by the mobile for every TCP packet received, the downlink TCP throughput 
of a single mobile is given by Section 3.2 in |16] as, 

e A p{m c ) = r, (1) 

mc(TTCPdata +TTCPack + 2T t bo + 2T W ) 

where Ltcp is the size of TCP packets and TxcPdata and TxcPack are the raw transmission 
times of a TCP data and a TCP ack packet, respectively. T t bo and T w denote the mean total 
time spent in back- off and the average total time wasted in collisions for any successful packet 
transmission and are computed assuming m,b backlogged mobiles. The explicit expressions 
for TTCPdata, TrcPack, T t bo and T w can be referred to in p*gj. However, we mention here 
that they depend on certain quantities whose numerical values have been provided in Section 
03 Note that all mobiles connected to the AP achieve equal downlink TCP throughputs in 
a fair manner, given by Equation^] Figure^shows a plot of total cell throughput in an AP 
cell. Since the total throughput monotonically decreases with increasing number of mobiles, 
the pole capacity of an AP cell Map is limited by the QoS requirement Q m in bps of each 
mobile. 

2.4 Downlink Throughput in 3G-UMTS NodeB 

We consider a standard model for data transmission on downlink in a 3G-UMTS NodeB 
cell. Let W be the WCDMA modulation bandwidth and if SINR denotes the signal to 
interference plus noise ratio received at a mobile then its energy per bit to noise density 
ratio is given by, 
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Table 1: 



Now, under the assumptions of identical throughput allocation to each mobile arriving at 
an average location and application of power control so that each mobile receives the same 
SINR (Section |22J, we deduce from Eq. 0that each mobile requires the same Eb/N ratio 
in order to be able to successfully decode NodeB's transmission. From Chapter 8 in 12J we 
can thus say that the downlink TCP throughput 63c of any mobile, in a NodeB cell with 
saturated resource allocation, as a function of load per user 77 is given by, 

77 W 

93Giv)= (E b /N )(l-a + lY (3) 

where a and % have been defined before in Section 12.21 Figure [2] shows a plot of total cell 
throughput of all mobiles against log(n) in a UMTS NodeB cell. The load per user 77 has been 
stretched to a logarithmic scale for better presentation. Also note that throughput values 
have been plotted in the second quadrant. As we go away from origin on the horizontal 
axis, log(rj) (and 77) decreases or equivalently number of connected mobiles increase. The 
equivalence between 77 and log(rf) scales and number of mobiles N(rf) can be referred to in 
Table [U 

It is to be noted here that the required Eb/N D ratio by each mobile is a function of its 
throughput. Also, if the NodeB cell is fully loaded with tjdl = rfnlu an d h° each mobile 
operates at its minimum throughput requirement of 9 m in then we can easily compute the 
pole capacity M^q of the cell as, 

M 3G Vdl =- (4) 

3G 9 min (E b /N )(l-a + i) [> 

For rfjjj? = 0.9 and a typical NodeB cell scenario that employs the closed-loop fast power 
control mechanism mentioned previously in Section 12.21 Table H shows the SINR (fourth 
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Figure 3: Hybrid cell scenario under Global optimality 



column) received at each mobile as a function of the avg. load per user (first column). 
Note that we consider a maximum cell load of 0.9 and not 1 in order to avoid instability 
conditions in the cell. These values of SINR have been obtained from radio layer simulations 
of a NodeB cell. The values shown here have been slightly modified since the original values 
are part of a confidential internal document at France Telecom R&D. The fifth column shows 
the downlink throughput with a block error rate (BLER) of 10~ 2 that can be achieved by 
each mobile as a function of the SINR observed at that mobile. And the sixth column in 
the table lists the corresponding values of Eb/N ratio (obtained from Equation El , that are 
required at each mobile to successfully decode NodeB 's transmission. 



3 Global Optimality: SMDP control formulation 

In the Global Optimality approach, it is the network operator that takes the optimal decision 
for each mobile as to which of the two AP or NodeB networks the mobile will connect to, 
after it has been admitted into the hybrid cell by the CAC controller (Figure EJ). Since 
decisions have to be made at each arrival, this gives an SMDP structure to the decision 
problem and we state the equivalent SMDP problem as follows: 

• States: The state of a hybrid cell system is denoted by the tuple (m c , 77) where m c 
(0 < m c < Map) denotes the number of mobiles connected to the AP and 77 (0.05 < 
V < 0.9) is the load per user of the NodeB cell. 

• Events: We consider two distinguishable events: (i) arrival of a new mobile after it 
has been admitted by CAC and (ii) departure of a mobile after service completion. 

• Decisions: For mobiles arriving in the common stream a decision action a € {0, 1, 2} 
has to be taken, a — represents rejecting the mobile, a — 1 represents routing the 
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mobile connection to AP network and a = 2 represents routing the mobile connection 
to NodeB network. 

• Rewards: Whenever a new incoming mobile is either rejected or routed to one of the 
two networks, it generates a certain state dependent reward. Generally, the aggregate 
throughput of an AP or NodeB cell drops when an additional new mobile connects to 
it. However the network operator gains some financial revenue from the mobile user 
at the same time. There is thus a trade-off between revenue gain and the aggregate 
network throughput which motivates us to formulate the reward as follows. The re- 
ward consists of the sum of a fixed financial revenue price component and /3 times an 
aggregate network throughput component, where j3 is an appropriate proportionality 
constant. When a mobile of the dedicated arrival streams is routed to the correspond- 
ing AP or NodeB, it generates a financial revenue of Jap and foe, respectively. A 
mobile of the common stream generates a financial revenue of /ap3G^ap on being 
routed to the AP and /ap3G^3G on being routed to the NodeB. Any mobile that is 
rejected does not generate any financial revenue. The throughput component of the 
reward is represented by the aggregate network throughput of the corresponding AP 
or NodeB network to which a newly arrived mobile connects, taking into account the 
change in the state of the system caused by this new mobile's connection. Whereas, if 
a newly arrived mobile in a dedicated stream is rejected then the throughput compo- 
nent represents the aggregate network throughput of the corresponding AP or NodeB 
network, taking into account the unchanged state of the system. For a rejected mo- 
bile belonging to the common stream, it is the maximum of the aggregate network 
throughputs of the two networks that is considered. 

• Criterion: The optimality criterion is to maximize the total expected discounted re- 
ward over an infinite horizon and obtain a deterministic and stationary optimal policy. 

Note that in the SMDP problem statement above, state transition probabilities have not 
been mentioned because depending on the action taken, the system moves into a unique new 
state deterministically, i.e., w.p. 1. For instance when action a = 1 is taken, the state evolves 
from (m c ,r)) to (m c + 1,77) or when action a — 2 is taken, the state evolves from (m c ,ri) 
to (m c , 77 — A (rj)). Applying the well-known uniformization technique from we can say 
that events (i.e., arrival or departure of mobiles) occur at the jump times of the combined 
Poisson process of all types of events with rate A := Xap + X3G + ^AP3G + Map + M3G, 
where [iap '■= max mc HAp(jric) and jlsc '■= max^ (v) ■ The departure of a mobile is 
either a real departure, or an artificial departure, when from a single mobile's point of view 
the corresponding server slows down due to large number of mobiles in the network. Then, 
any event occurring, corresponds to an arrival on the dedicated streams with probability 
Xap /A and A3G/A, an arrival on the common stream with probability Xap3g/A- and a 
real departure with probability /iAp("i-c)/A or /j,3g(?7)/A. As a result, the time periods 
between consecutive events are i.i.d. distributed and we can consider an n— stage SMDP 
decision problem. Let V n (m c , rj) denote the maximum expected n— stage discounted reward 
for the hybrid cell, when the system is in state (m c ,r?). The stationary optimal policy that 
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achieves the maximum total expected discounted reward over an infinite horizon can then 
be obtained as a solution of the n— stage problem as n — > 00. The discount factor is denoted 
by 7 (0 < 7 < 1) and determines the relative worth of the present reward v/s the future 
rewards. State (m c ,rj) of the system is observed right after the occurrence of an event, 
for example, right after a newly arriving mobile in the common stream has been routed 
to one of the networks, or right after the departure of a mobile. Let U n (m c ,r);a) denote 
the maximum expected n— stage discounted reward for the hybrid cell when the system 
is in state (m c ,rj), given that an arrival event has occurred and given that action V will 
be taken for this newly arrived mobile. We can then write down the following recursive 
Dynamic Programming (DP) equation to solve our SMDP decision problem. Vn > and 
< m c < M AP , 0.05 < r\ < 0.9, 



V n+1 (m c ,T]) 



ax {R A p(m c ,T]; a) + jU n (m c ,T]; a)} 



A ae{o,i} 

^3G 

+ — — max {R 3G (m c ,r];a) + jU n (m c ,r];a)} 

A ae{0,2} 

+ Aap3£ max {# Ap3G ( mc)r?;a ) +ryU n (m c ,ri;a)} 

A a£{0,l : 2} 



(5) 



A 



7K((m c -l)V0,r)) 



+ M3G(?7) 7K(mc)(?7 + AW)A0 _ 9) 

A - (\ A p + A 3G + \ A P3G + HAp{m c ) + n 3G {r]j) t/ , ^ 



where, 



(3 m c A p(m c ) : a = 

RAp(m c ,ri;a) = { f A p + (3 {m c + 1) 6 AP (m c + 1) : a = l,m c < M A p 

(3 m c 9 A p(m c ) : a = 1, m c = Map 



(6) 



P N(v) 630(7]) 
R 3G (m c , r];a)={ ha + P N{ V - A(r?)) 9 3G ( V - A(r/)) 

pN{ri) 3G (v) 



a 
a 







2,N(r 1 )<M 3G (7) 
: a = 2, N(ri) = M 3G 



RAP3G(m c ,i]; a) 



max{/3 


m c 6 AP {m c ),p N(rj) 9 3G ( V )} 


: a 


fAP3G-> 


ap + P (m c + 1) 9 A p(m c + 1) 


: a 




P m c 9 A p(m c ) 


: a 


fAP3G^3G 


+ pN(r ] -A(rj))e 3G (r ] -A(rj)) 


: a 




pN(r,) 9 3G ( V ) 


: a 





l,m c < M A p 

1, m c = M A p 

2, N( V )<M 3G 
2,N(r])=M 3G 

(8) 
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and U n (m c ,ri;0) := V n (m c ,r)), U n (m c ,T);l) := V n ((m c + 1) A M AP , 77), U n (m c ,r];2) := 
Vn(m c , {r) — A(?7)) V 0.05) for # mi „ = 46 kbps and N(r]) can be obtained from TableQ] We 
solve the above DP equation with Value Iteration method using the following numerical 
values for various entities: Ltcp — 8000 bits (size of TCP packets), Lmac — 272 bits, 
L IPH = 320 bits (size of MAC and TCP/IP headers), L ACK = 112 bits (size of MAC layer 
ACK), L RTS = 180 bits, L CTS = 112 bits (size of RTS and CTS frames), R data = H 
Mbits/s, Rcontroi = 2 Mbits/s (802.11 data transmission and control rates), CW m i n = 32 
(minimum 802.11 contention window), T P = 144/js, T P hy = 48/js (times to transmit the 
PLCP preamble and PHY layer header), Tdifs — 50/js, Tsifs = 10/us (distributed inter- 
frame spacing time and short inter-frame spacing time), T s i ot = 20/is (slot size time), K = 7 
(retry limit in 802.11 standard), bo = 16 (initial mean back-off), p = 2 (exponential back-off 
multiplier), 7 = 0.8, \ AP = 0.03, A 3G = 0.03, A AP3G = 0.01, C = 10~ 6 , =_ 10~ 6 , M AP = 18 
and M 3G = 18 for B mm = 46 kbps, a = 0.9 for ITU Pedestrian A channel, i = 0.7, W = 3.84 
Mcps and other values as illustrated in Table 

The DP equation has been solved for three different kinds of network setups. We first 
study the simple homogenous network case where both networks are AP and hence an incom- 
ing mobile belonging to the common stream must be offered a connection choice between two 
identical AP networks. Next, we study an analogous case where both networks are NodeB 
terminals. We study these two cases in order to gain some insight into connection routing 
dynamics in simple homogenous network setups before studying the third more complex, 
hybrid AP-NodeB scenario. Figures T4I8I show the optimal connection routing policy for the 
three network setups. Note that the plot in Figure is in the 3 rd quadrant and plots in 
Figures EE are in the 2 nd quadrant. In all these figures a square box symbol (□) denotes 
routing a mobile's connection to the first network, a star symbol (*) denotes routing to the 
second network and a cross symbol (x) denotes rejecting a mobile all together. 

In Figure 31 optimal policy for the common stream in an AP-AP homogenous network 
setup is shown with J * AP \ AP 2^ap\ = j ' apiapi^api = 5 (with some abuse of notation). 
The optimal policy routes mobiles of common stream to the network which has lesser num- 
ber of mobiles than the other one. We refer to this behavior as mobile-balancing network 
phenomenon. This happens because the total throughput of an AP network decreases with 
increasing number of mobiles (Figure QJ. Therefore, an AP network with higher number of 
mobiles offers lesser reward in terms of network throughput and a mobile generates greater 
incentive by joining the network with fewer mobiles. Also note that the optimal routing 
policy in this case is symmetric and of threshold type with the threshold switching curve 
being the coordinate line y = x. 

FigureElshows the optimal routing policy for the common stream in a NodeB-NodeB ho- 
mogenous network setup. With equal financial incentives for the mobiles, i.e., /3G13G2-+3G1 = 
/3G13G2— >3G2 = 5 (with some abuse of notation), we observe a very interesting switching 
curve structure. The state space in Figure is divided into an L-shaped region (at bottom- 
left) and a quadrilateral shaped region (at top-right) under the optimal policy. Each region 
separately, is symmetric around the coordinate diagonal line y — x. With some abuse of 
notation, consider the state (771,772) = (—0.79851, —1.4917) (not the coordinate point) of the 
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Figure 4: Optimal policy for common flow in AP-AP setup. First network: 
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Figure 6: Optimal policy for common flow in AP-NodeB hybrid cell. First network: 
AP, Second Network: NodeB. 
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homogenous network on logarithmic scale in the upper triangle of the quadrilateral region. 
From Tabled this corresponds to the network state when load per user in the first NodeB 
network is 0.45 which is more than the load per user of 0.225 in the second NodeB network. 
Equivalently, there are less mobiles connected to the first network as compared to the second 
network. Ideally, one would expect new mobiles to be routed to the first network rather 
than the second network. However, according to Figure^ in this state the optimal policy is 
to route to the second network even though the number of mobiles connected to it is more 
than those in the first. We refer to this behavior as mobile- greedy network phenomenon 
and explain the intuition behind it in the following paragraph. The routing policies on 
boundary coordinate lines are clearly comprehensible. On y — —2.9957 line when the first 
network is full (i.e., with least possible load per user), incoming mobiles are routed to sec- 
ond network (if possible) and vice-versa for the line x = —2.9957. When both networks 
are full, incoming mobiles are rejected which is indicated by the cross at coordinate point 
(x,y) = (-2.9957,-2.9957). 

The reason behind the mobile-greedy phenomenon in Figure can be attributed to the 
fact that in a NodeB network, the total throughput increases with decreasing avg. load 
per user up to a particular threshold (say rjthres) and then decreases thereafter (see Figure 
[2j. Therefore, routing new mobiles to a network with lesser (but greater than rjthres) load 
per user results in a higher reward in terms of total network throughput, than routing 
new mobiles to the other network with greater load per user. However, the mobile-greedy 
phenomenon is only limited to the quadrilateral shaped region. In the L-shaped region, the 
throughput of a NodeB network decreases with decreasing load per user, contrary to the 
quadrilateral region where the throughput increases with decreasing load per user. Hence, in 
the L-shaped region higher reward is obtained by routing to the network having higher load 
per user (lesser number of mobiles) than by routing to the network with lesser load per user 
(greater number of mobiles). In this sense the L-shaped region shows similar characteristics 
to mobile-balancing phenomenon observed in AP-AP network setup (Figure 

We finally discuss the hybrid AP-NodeB network setup. Here we consider financial rev- 
enue gains of Jap3G~>ap — 5 and Jap3G^3G — 5.65, motivated by the fact that a network 
operator can charge more for a UMTS connection since it offers a larger coverage area and 
moreover UMTS equipment is more expensive to install and maintain than WLAN equip- 
ment. In Figure® we observe that the state space is divided into two regions by the optimal 
policy switching curve which is neither convex nor concave. Moreover, in some regions of 
state space the mobile-balancing network phenomenon is observed, where as in some other 
regions the mobile-greedy network phenomenon is observed. In some sense, this can be 
attributed to the symmetric threshold type switching curve and the symmetric L-shaped 
and quadrilateral shaped regions in the corresponding AP-AP and NodeB-NodeB homoge- 
nous network setups, respectively. Figures and show the optimal policies for dedicated 
streams in an AP-NodeB hybrid cell with Jap = fza = 0. The optimal policy accepts 
new mobiles in the AP network only when there are none already connected. This happens 
because the network throughput of an AP is zero when there are no mobiles connected and 
a non-zero reward is obtained by accepting a mobile. Thereafter, since Jap = the pol- 
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Figure 7: Optimal policy for AP dedicated flow in AP-NodeB hybrid cell 
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Figure 8: Optimal policy for NodeB dedicated flow in AP-NodeB hybrid cell 



icy rejects all incoming mobiles due to decrease in network throughput and hence decrease 
in corresponding reward, with increasing number of mobiles. Similarly, for the dedicated 
mobiles to the NodeB network, the optimal policy accepts new mobiles until the network 
throughput increases (Figure [2j) and rejects them thereafter due to absence of any financial 
reward component and decrease in the network throughput. Note that we have considered 
zero financial gains here {Jap — he — 0) to be able to exhibit existence of these threshold 
type policies for the dedicated streams. 
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Figure 9: Hybrid cell scenario under Individual optimality 

4 Individual Optimality: Non-cooperative Dynamic Game 

In the Individual Optimality approach here, we assume that an arriving mobile must itself 
selfishly decide to join one of the two networks such that its own cost is optimized. We 
consider the average service time of a mobile as the decision cost criteria and an incoming 
mobile connects to either the AP or NodeB network depending on which of them offers 
minimum average service time. We study this model within an extension of the framework 
of QB] where an incoming user can either join a shared server with a PS service mechanism or 
any of several dedicated servers. Based on the estimate of its expected service time on either 
of the two servers, a mobile takes a decision to join the server on which its expected service 
time is least. This framework can be applied to our hybrid cell scenario so that the AP is 
modeled by the shared server and the dedicated DCH channels of the NodeB are modeled 
by the dedicated servers. For simplicity, we refer to the several dedicated servers in a s 
one single dedicated server that consists of a pool of dedicated servers. Then the NodeB 
comprising the dedicated DCH channels is modeled by this single dedicated server and this 
type of framework then fits well with our original setting in Section l2Tl Thus we again have 
an M/G/2 processing server situation (see Figure EJ. As mentioned before, the mobiles of 
dedicated streams directly join their respective AP or NodeB network. Mobiles arriving in 
the common stream decide to join one of the two networks based on their estimate of the 
expected service time in each one of them. However, an estimate of the expected service 
time of an arriving mobile j must be made taking into account the effect of subsequently 
arriving mobiles. But these subsequently arriving mobiles are themselves faced with a similar 
decision problem and hence their decision will affect the performance of mobile j which is 
presently attempting to connect or other mobiles already in service. This dependance thus 
induces a non-cooperative game structure to the decision problem and we seek here to study 
the Nash equilibrium solution of the game. The existence, uniqueness and structure of the 
equilibrium point have been proved in ^Bj already. Here we seek to analytically determine 
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the service time estimate and explicitly compute the equilibrium threshold policy. As in 
[18] . a decision rule or policy for a new mobile is a function u : {0, 1, . . . , M A p — 1} — > [0, 1] 
where M^p is the pole capacity of the AP network. Thus for each possible state of the 
AP network denoted by number of mobiles already connected, m Ci a new mobile takes a 
randomized decision u(m c ) € [0, 1], that specifies the probability of connecting to the AP. 
1 — u(m c ) then represents either the probability of connecting to the NodeB or abandoning to 
seek a connection altogether if both networks are full to their pole capacity. A policy profile 
7r = (uo, is a collection of decision rules followed by all arriving mobiles indexed 

(0,1,...). 

Define V A p{m c , it) as the expected service time of a mobile in the AP network, given that 
it joins that network, m c mobiles are already present and all subsequent mobiles follow the 
policy profile ir. A single mobile generally achieves lower throughputs (i.e., higher service 
times) in a NodeB network as compared to in an AP network. For simplification, we assume 
a worst case estimate for the expected service time of a mobile in the NodeB network. 
Denote jl^c '■= min^ H3g(v) an d let r :— \/jlzG be the maximum service time of a mobile 
in the NodeB cell, which is independent of network state rj. For some q (0 < q < 1, q € R), 
define a decision policy u(m c ) to be the best response of a new mobile, against the policy 
profile 7r = (uo,ui, . . .) followed by all subsequently arriving mobiles [I8 a , as, 

{1 : V AP (m c ,ir) < r 

q : V A p(m c ,ir) = r 

: V A p(m c ,iT) > t 

Further, define a special kind of decision policy, namely the threshold policy as, given q and 
L such that < g < 1, g6R and L > 0, L G Z + , an L, q threshold policy ul, q is defined as, 

{1 : m c < L 

q : m c = L 

: m c > L 

This L, q threshold policy will be denoted by [L, q] or more compactly by [g] where g = L + q. 
Note that the threshold policies [L, 1] and [L + 1, 0] are identical. We also use the notation 
[g]°° = [L, q] to denote the policy profile ir — ([g], \g], . . .). Now, it has been proved in 
Lemma 3 in JHj that the optimal best response decision policy u(m c ) for a new mobile, 
against the policy profile n followed by all subsequently arriving mobiles, is actually the 
threshold policy [L*, q*] which can be computed as follows. If Vap(M A p — 1, [M^p] 00 ) < t 

then L* = M AP and q* = 0. Otherwise, let L min = min{L e Z+ : V A p(L, [L. I] 00 ) > r}. 
Now, if V AP (L min , [L mm , 0]°°) > r, then the threshold policy is given by [L*,q*] = [L min , 0]. 
Else if V AP (L mm , [L mm ,0]°°) < r then it is given by [L*,q*] = [L mm ,q*\ where q* is the 
unique solution of the equation, 

V AP {L mm , [L mm , q*] co ) = t. (9) 

Assuming state dependent service rate /i A p(m c ) for a mobile in the AP network, we now 
compute V A p(m c , tt) analytically. At this point we would like to mention that the derivation 
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of the entity equivalent to VAp{m Cl ir) in ^Hj ls actually erroneous. Moreover the basic 
framework in [18 differs from ours, since in our framework we have dedicated arrivals in 
addition to the common arrivals and we consider a state dependent service rate /i^p(m c ) 

for the shared AP server. For notational convenience, if V(m c ) = Vap(iti c , [L, q] 00 ), < 
m c < Map — 1, then it is the solution of the following set of Map linear equations, where 
oi := Xap + Xap3G + HAp{mc) (dependence of a on m c has been suppressed in the notation): 
Case 1: 4 < L < Map - 2, 

V(Q) = 1 + \AP + \AP3G V{1) 

a a 

T7-/ \ 1 . VAp{m c ) m c X A p + X A P3G t „ , 

V(m c ) = 1 — V{m c — lH V(m c + 1), 

a a m c + 1 a 

1 < m c < L - 2 

V(L-1) = - + »ML-1)L-1 _ 2)+ X AP + q X AP3G 
a a L a 

+ ^o {1 _ q)V{L _ 1) 
a 

V ^ = a / m + x T (L) m 7TT y ( L - 1} + a I" m v ^ L + ^ 

Xap + hap{L) X A p + fJ.AP{L) L + 1 X A p + hap{L) 



V(m c 



1 HAp{m c ) m c 



Xap + LiAp(m c ) X A p + HAp{m c ) m c + 1 

x V(m c - 1) + XAP . rV(m c + 1), L + 1 < m c < M A p - 2 

Xap + ^AP(m c ) 



(10) 



V(M AP - 1) = — ! + MAP l V{M AP - 2) 

fiAP(M A p - 1) Map 

Case 2: L = M A p - 1, 

v(o) = - + Xap + Xap3G v(i) 

a a 

jr/ \ 1 VAp{m c ) m c X A p + A AP3G 

v m c = 1 -V(m c — 1H k (m c + 1), 1 < m c < L — 2 

a a m c + 1 a 

V(L - 1) = - + »ML-1)L-1 _ 2)+ X A P+ q X A P3G v{L) 
a a L a 

+ *AP*G il _ q)v{L _ 1) 
a 

(11) 

The above system of Map linear equations with m c — L and q = 1 can be solved 
to obtain Vap(£, [L, 1]°°) for different values of L. Figure iTfll shows an example plot for 
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Figure 10: V A p(L, [L, 1]°°) v/s L for \ A p = 3 and M A p = 10 



C = 10~ 5 , Xap — 3, Map = 10, M 3 q — 10 and other numerical values for various entities in 
WLAN and UMTS networks being the same as those used in Section Assuming a certain 
pole capacity M%g of the NodeB cell, r can be computed from its definition and Equation 
Knowing r, one can compute L mm from Figure and then finally q* from Equation 
El Figure ITD shows a plot of the equilibrium threshold g* = L* + q* against \ap3G, with 
computed value of t — 2.5 for Msg = 10 and Xap — 3. As in JHj, the equilibrium threshold 
has a special structure of descending staircase with increasing arrival rate (Xapsg) of mobiles 
in common stream. 



5 Conclusion 

In this paper, we have considered optimal user-network association or load balancing in an 
AP-NodeB hybrid cell. We have studied two different and alternate approaches of Global 
and Individual optimality under SMDP decision control and non-cooperative dynamic game 
frameworks, respectively. To the best of our knowledge, this study is the first of its kind. 
Under global optimality, the optimal policy for common stream of mobiles has a neither 
convex nor concave type switching curve structure, where as for the dedicated streams it has 
a threshold structure. Besides, a mobile-balancing and a mobile-greedy network phenomenon 
is observed for the common stream. For the analogous AP-AP homogenous network setup, 
a threshold type and symmetric switching curve is observed. An interesting switching curve 
is obtained for the NodeB-NodeB homogenous case, where the state space is divided into 
L-shaped and quadrilateral shaped regions. The optimal policy under individual optimality 
model is also observed to be of threshold type, with the threshold curve decreasing in a 
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Figure 11: g* v/s \ap3G for \ap = 3, Map = 10 and t = 2.5 

staircase fashion when plotted against increasing arrival rate of the mobiles of common 
stream. 
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